15 research outputs found
GOGGLES: Automatic Image Labeling with Affinity Coding
Generating large labeled training data is becoming the biggest bottleneck in
building and deploying supervised machine learning models. Recently, the data
programming paradigm has been proposed to reduce the human cost in labeling
training data. However, data programming relies on designing labeling functions
which still requires significant domain expertise. Also, it is prohibitively
difficult to write labeling functions for image datasets as it is hard to
express domain knowledge using raw features for images (pixels).
We propose affinity coding, a new domain-agnostic paradigm for automated
training data labeling. The core premise of affinity coding is that the
affinity scores of instance pairs belonging to the same class on average should
be higher than those of pairs belonging to different classes, according to some
affinity functions. We build the GOGGLES system that implements affinity coding
for labeling image datasets by designing a novel set of reusable affinity
functions for images, and propose a novel hierarchical generative model for
class inference using a small development set.
We compare GOGGLES with existing data programming systems on 5 image labeling
tasks from diverse domains. GOGGLES achieves labeling accuracies ranging from a
minimum of 71% to a maximum of 98% without requiring any extensive human
annotation. In terms of end-to-end performance, GOGGLES outperforms the
state-of-the-art data programming system Snuba by 21% and a state-of-the-art
few-shot learning technique by 5%, and is only 7% away from the fully
supervised upper bound.Comment: Published at 2020 ACM SIGMOD International Conference on Management
of Dat
ADAGIO: Interactive Experimentation with Adversarial Attack and Defense for Audio
Adversarial machine learning research has recently demonstrated the
feasibility to confuse automatic speech recognition (ASR) models by introducing
acoustically imperceptible perturbations to audio samples. To help researchers
and practitioners gain better understanding of the impact of such attacks, and
to provide them with tools to help them more easily evaluate and craft strong
defenses for their models, we present ADAGIO, the first tool designed to allow
interactive experimentation with adversarial attacks and defenses on an ASR
model in real time, both visually and aurally. ADAGIO incorporates AMR and MP3
audio compression techniques as defenses, which users can interactively apply
to attacked audio samples. We show that these techniques, which are based on
psychoacoustic principles, effectively eliminate targeted attacks, reducing the
attack success rate from 92.5% to 0%. We will demonstrate ADAGIO and invite the
audience to try it on the Mozilla Common Voice dataset.Comment: Demo paper; for supplementary video, see https://youtu.be/0W2BKMwSfV